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Abstract 

P-(' 

■ Within the literature on non-cooperative game theory, there have been a number of algorithms which will 
T^/^ ' compute Nash equilibria. This paper shows that the family of algorithms known as Markov chain Monte 

— , Carlo (MCMC) can be used to calculate Nash equilibria. MCMC is a type of Monte Carlo simulation that 

relies on Markov chains to ensure its regularity conditions. MCMC has been widely used throughout the 
statistics and optimization literature, where variants of this algorithm are known as simulated annealing, 
i—i This paper shows that there is interesting connection between the trembles that underlie the functioning 

of this algorithm and the type of Nash refinement known as trembling hand perfection. This paper shows 
\ that it is possible to use simulated annealing to compute this refinement. 

Keywords:Tiemb\mg Hand Perfection, Equilibrium Selection and Computation, Simulated Annealing, Markov 
Chain Monte Carlo 

■ 1 Introduction 

: 

This paper develops an algorithm to compute a desired type of Nash Equilibrium. Furthermore we use 
this algorithm to show existance and uniqness of sensible Nash Equilibrium. Our novel approach to this 
problem has been motivated by the number of existance algorithms. The basis of the general approach of 

£^ | the literature has been to rely on the geometric properties of the equilibrium. 

This paper is interested in computing Nash equilibria that satisfy the type of Nash of refinement refered 
to as "trembling hand" perfection ^1 |T7|. This paper shows that simulated annealing can be used to 
compute the above refinement. Simulated annealing is a type of Monte Carlo sampling procedure that 
relies on Markov chains to ensure its regularity conditions. Most applications have mainly concentrated on 

• *h . problems of combinatorial optimization such as routing and packing problems, or problems from statistical 

?\ ' pattern recognition like image processing. 

Another well known group of algorithms for calculating Perfect Nash Equilibria are the trace algorithms 
of Harsanyi and Selten [7] , where an outcome for the game is selected by "tracing" a feasible path through 
a family of auxiliary games. The solution progress along the feasible path is intended to represent the way 
in which players adjust their expectations and predictions about the play of the game. 

A major limitation of the tracing procedure is that the logarithmic version of this method, does not 
always provide a path that traces to a perfect equilibrium. Harsanyi 6, p. 69], has argued that this problem 
can be resolved by eliminating all dominated pure strategies before applying the tracing procedure. However 
van Damme |19l p. 77] constructs examples which do not rquire dominated pure strategies in which the 
tracing procedure yields a non-perfect equilibrium. Furthermore it was suggested by van Damme that the 
inconsistancy lies in the logarithmic control costs. Games which have a control cost parameter are of normal 
form so that players may also choose strategics, incur depending on how well they choose to control their 
actions. 

Another limitation of the tracing procedure it relies on the algeobro-geometric properties of the equilib- 
rium. This approach has been commonly used throughout the literature for computing the equilibrium of 
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non-cooperative games. For example the focus of Lemke and Howson ^U] for bimatrix games and the Wilson 
[2"T] and Scarf 15 algorithm for the TV-person games has also been to utilise the fundamental geometry of 
games to calculate equilibrium. In general these approaches to Equilibrium calculation are computationally 
expensive. 

However, within game theory there is a history of Monte Carlo methods being applied to solve non- 
cooperative games, e.g. starting with Ulam |18j in 1954. From the view point of applying global optimization 
techniques to infinite games, Monte Carlo simulation has been used by Georgobiani and Torondzadze as a 
means of providing Nash equilibria for rectangular games This is the approach that we will be developing 
in this paper. 

This paper is organised as follows. The second section of this paper introduces the MCMC algorithm 
and provides some discussion of its convergence properties in terms of Markov chain theory. As a starting 
point for this discussion the connection between MCMC sampling techniques and Monte Carlo sampling 
techniques is explored. The MCMC algorithms include the Gibbs sampler and the Metropolis algorithm 
and arc often called simulated annealing. The third section of this paper will provide a characterization of 
these algorithms in terms of the trembling hand of trembling hand perfection. With this in mind, we provide 
an example of the use of simulated annealing applied to calculating Nash equilibrium. In this example the 
solution leads to equilibria that result from trembling hand perfection. 



2 A Review of Simulated Annealing 

Monte Carlo simulation has been used extensively for solving complicated problems that defy an analytic 
formulation. The main idea behind Monte Carlo simulation is to either construct a stochastic model that is 
in agreement with the actual problem analytically, or to simulate the problem directly. One problem with 
Monte Carlo methods is that if the underlying probability distribution is non-standard, then the convergence 
of sampled stochastic process cannot be assured by the SLLN. One way around this is to realize that a 
stochastic process can be generated from any process that draws its samples from the support of underlying 
distribution. Markov chain Monte Carlo (MCMC) does this by constructing a Markov chain that uses the 
underlying distribution as its stationary distribution. This enables the simulation of the stochastic process 
for non-standard distributions, while ensuring that the SLLN will hold. 

As an illustration of the MCMC we will discuss the Metropolis algorithm [H]. In this algorithm, each 
iteration will comprise h updating steps. Let Xt.% denote the state of Xi at the end of the ith iteration. For 
step i of iteration t + 1, Xi is updated using the Metropolis algorithm. The candidate Yi is generated from 
a proposal distribution qi (Yi\X t ,i, X t ^-i), where X t ^-i denotes the value of 

X_i = {Xi, Xj_x, Xi + \, ...,X h } 

after completing step i — 1 of iteration t + 1, i.e. 

Xt,-i = {Xt+i t i, Xt+ij—i, X t .i+i, X t .h} i 

where the components Xt,%+i, ...,Xt,h have yet to be updated and components Xt+i,i, Xt+i,i-i have 
already been updated. Thus the proposal distribution of the ith component qi (-|-, •), generates a candidate 
for only the ith component of X. The candidate is accepted with probability 

a (X-i, X^ Yi) = mm 1, 



where 

7T (Xi\X-i 



i:{X l \X- i )q{Y l \X l ,X- l ) 
n(X) 



J TT (X) dX A 



is the full conditional distribution for Xi under -k (•). If Y :l is accepted, then X t +\.i = Yf, otherwise JQ+i^ 
X ti i. For this reason a {X-i, Xj, Yi) is known as the Metropolis criterion. 
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One of the disadvantages of this algorithm is the complexity of the Metropolis criterion a (X.-i, X.i, Y.i). 
In practice Q(X_,:,X.i,y,) often simplifies considerably, particularly when tt (■) derives from a conditional 
independence model |S] |14|. However, the single component Metropolis algorithm has the advantage of 
employing the full conditional distributions for 7r(-) and Besag has shown that 7r(-) will be uniquely 
determined by its full conditional distribution. As a result a (X.-i, X j, Yi) will generate samples from a 
unique target distribution tt (•). 

An alternative approach for constructing a Markov chain with a stationary distribution tt (■) , that pro- 
vides a generalization of the approach suggested by Metropolis et al. (TT], has been suggested by Hastings 
[5]. At each point in time t, the next state X t +i is chosen by first sampling a candidate point Y from a 
proposal distribution q(-\X t ). The candidate point Y is then accepted in accordance with the criterion 

a(X,Y) =min (l * ^ 



n(X) 

Under this criterion, if the candidate point is accepted, then X t +i — Y , otherwise X t +i = X t . The main 
difference between this algorithm and the one proposed by Metropolis et al. is that the Metropolis- 

Hastings algorithm, as it is named, assumes that the proposal distributions are symmetric, i.e. q(Y\X) = 
q(X\Y). The Metropolis-Hastings algorithm is therefore ruled out for higher dimensional problems, as 
these problems generally have little symmetry. The main advantage of the Metropolis-Hastings algorithm 
is that proposal distribution has no impact on the decision criterion, and therefore will not impact on the 
convergence of this algorithm towards the stationary distribution tt (•). 

To provide a fuller explanation, the transition kernel of the Metropolis-Hastings algorithm is given by 

P{X t+1 \X t )=q{X t+1 \X t )a{X t ,X t+1 ) 

(2.1) 



I (X t+1 = x t 



I- J q(Y\X t )a(X t ,Y) dY 



where /(•) is the indicator function. From a (X t , X t +\), we can see that 

n(X t )q(X t+1 \X t )a(X t ,X t+1 ) = 
Ti(X t+1 )q(X t \X t+1 )a(X t+u X t ). 

This implies that 

it (X t ) P (X t+1 \X t ) = ir {X t+1 ) P (X t \X t+1 ) . 
Integrating both sides of this equation, we get 

J TT {X t ) P [Xt+1 \X t ) dX t = TT (X, 



t+lj 



This equation states that if X t is drawn from tt, then so must X t +\. In other words, once one sample value 
has been obtained from the stationary distribution, then all subsequent samples must be drawn from the 
same distribution. 

This is only a partial justification of the Metropolis-Hastings algorithm. A full proof requires that 
PW (Xt\Xo) converges on the stationary distribution. For a heuristic justification of this result, it can be 
noted that this distribution will depend only on the starting value Xq, therefore the proof must show that 
Markov chain gradually forgets its starting point, and converges on a unique stationary distribution. Thus, 
after a sufficiently long burn-in of m iterations, points {X t ; t = m + 1, n} will be dependent sample ap- 
proximations of the stationary distribution. Hence the burn-in sample is usually discarded when calculating 
the ergodic mean for / (X) 
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3 Trembling Hand Algorithm 



3.1 A MCMC Algorithm for Computing Perfect Equilibria in Strategic Games 

In this sub-section we provide an algorithm for computing a perfect equilibrium for a strategic game and 
show that this algorithm provides a sequence of perturbed mixed strategies that will eventually converge 
on perfection. The basic idea is to construct select a Markov chain and then use this Markov to deliver a 
Nash equilibrium via Markov chain approximation. The trick is to nominate the appropriate Markov chain 
with the most suitable convergence properties to deliver convergence of the sequence completely mixed Nash 
equilibria of perturbed games or e-perfect equilibria to a perfect equilibrium. This is the objective that is 
undertaken in this section. 

Consider an n-person game in strategic form G — (N, (Si) ieN , (wj) ieJV ) in which TV = {1, ...,n} is the 
player set, each player i E N has a finite set of pure strategies Si — {s il; s iki } and a pay-off function 
Ui ■ XieNSi — > R mapping the set of pure strategy profiles x ie ^Si into the real number line. 

In the strategic game G, for each player i G N there is a set of probability measures A, that can be 
defined over the pure strategy set Si, this is player i's mixed strategy set. The elements of the set A, are of 
the form pi : Si — > [0, 1] where Y^j=i Pij = 1) with Pij = p (sy) , i.e. Aj is isomorphic to the unit simplex. 

We denote the elements of the space of mixed strategy profiles x^n^i by p — (pi, ...,p n ) , where 
Pi = (pn, —iPiki) 6 Aj. As is the convention we use the following short-hand notation p = (puP-i), where 
P-i denotes the other components of p. 

For each player i, the pay-off function Uj : x^g^vAi — > R can be extended to the domain of mixed 
strategy profiles x^^Ai. The pay-off function for each player % G N will be defined as follows Ui (pi,p-i) = 
Y^!j=iPij u i ( s ijiP-i)- A mixed strategy p G x^wAi is Nash equilibrium of the strategic game G, if for 
all players i G N and all p- G A, 

Ui{p l ,p-i)>u i {p' l ,p^ i ). (3.1) 

Suppose that as well there being a positive probability p^ of a player i selecting a pure strategy s^ G Si, 
there is a small probability Sij that the pure strategy Sjj will be chosen by i out of error. In the case where 
player i selects his jth pure strategy by mistake, the probability of doing so is given by q^. The total 
probability of player i selecting a pure strategy sy G Si is then given by 

Pij = (1 - £ij)Pij + SijQij- (3.2) 

It can be seen that in this case, the total probability of player i selecting a pure strategy Sy G 5, will be 
bounded below by 

Pij ^ ZijQij- (3-3) 

Equating jjy = e^qij we can see that this condition can be rewritten as 

Pij > Vij V s ij € Si and i G N, (3.4) 

with 

ki 

^riijKl VieN. (3.5) 

3=1 

This leads to the definition of a perturbed game (G, rj) as a finite strategic game derived from the strategic 
game G, in which each player i's mixed strategy set is the set of completely mixed strategies for player i 
constrained by the probability of making an error 



Ai =Pi = jfei, -;Piki) G A^py > 77ij and ^2 j=1 Vij < l| 



(3.6) 



A mixed strategy combination p G x ie 7vAi is a Nash equilibrium of the perturbed game (G, 77) iff the 
following condition is satisfied 

Ui(sij,p-i) < Ui(su,p-i) then p^ =Vij, Vsy,s i; G Sj. (3.7) 
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A mixed strategy p€ x, e jvAj is a perfect equilibrium in the strategic game G if there exists a sequence 
of completely mixed strategy profiles {p fc } fc _ 1 where \m\k^ 00 p k — p, and for every player i £ N and for 
every p\ S Aj 

Ui (pi.p^) > u, (pl.p^) Vfe = 1,2, .... (3.8) 

In terms of our definition of a perturbed game, a mixed strategy is a perfect equilibrium iff there exist some 
sequences {?7 fc = (r) k , and {p fe = (pf, such that 

1. each rf > and lim^oo 77^ = 0, 

2. each p fe is a Nash equilibrium of a perturbed game equilibrium (G, 77^), and 

3. linifc^oo p k — p where for every player i £ N and for every G Aj 

Wi(pi,P_i) >"i(Pi,P*i) Vfc = l,2,.... (3.9) 

An alternative definition of perfection has been made Myerson ^1 pp 75-76] and is based on the idea 
that every pure strategy in a player's set of pure strategies has associated with it a small positive probability 
of at least e > 0, but on strategies that are best responses have associated probabilities greater that e. More 
formally, for any player i 6 N a mixed strategy pi £ A, is an e-perfect equilibrium iff it is completely 
mixed and 

m (sij,p-i) < u t (su,p-i) then p t] < e, Vsy,Sij £ Sj. (3.10) 

Unlike Nash equilibria of perturbed games, the e-perfect equilibria of a game G will not necessarily be one of 
its Nash equilibria. However, Myerson does show that p = (pi, ...,p n ) £ XieN^i will be a perfect equilibrium 
iff 

1. each e k > and lim^oo e k — 0, 

2. each p k is an e fe -perfect equilibrium of the game G, and 

3. limfe^oo p k = pi for every player i £ N. 

The starting basis for the MCMC algorithm for calculating perfection will be to follow Myerson by 
constructing a sequence of e-perfect equilibria for the strategic game G. As stated above, we know that 
for the strategic game G, p £ Xi^jyAi is an e-perfect equilibrium iff for each player i £ N, pi £ A, is a 
completely mixed strategy and 

Ui (sij,P-i) < Ui (su,p-i) then < e, . 

W a Q (AH) 

V Sij , Sn £ Jj . 

Following Myerson |12l p 79] we define the following set of mixed strategies for each player i £ N 

A* = { Pl £ A i;Pij > S y Sij £ Si} , (3.12) 

where 

S = —e m , < e < 1 (3.13) 
m 

with m = maxigjv We then define a point-to-set mapping Fi : x^^A* — > A* to be a family of 

completely mixed distributions contained in A* 

Fi (Pl, -,Pn) = {?* € A*;M, (Sij,p-i) < u % (sa,p-i) ^ 

then pij < e, Vs^sy £ Sj} 
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If we then define, for each player i G N, a mixed strategy 

e p(sij) 

« - (3 - 15 » 

where 

= \{ S U £ Si;Ui(sij,p-i) < Ui(su,p-i) andpe Xj £jv A*}| (3.16) 

Then it can be seen that p* G Fi (pi, ...,p n ) will be non-empty. As each Fi (pi, ...,p n ) will a finite collection 
of linear inequalities, they will also be closed convex sets. In addition each Fi (pi, ...,p n ), by the continuity 
of the pay-off function m (s^ , •) , will also be upper semi-continuous. 

As a consequence the mapping F : x ie jvA* — ► x ie jvA* satisfies all the conditions of the Kakutani Fixed 
Point Theorem. In other words there exists some completely mixed strategy p e G x ie jyA* such that p e is 
an e-pcrfcct equilibrium of G. As XjgjvAj is compact, the sequence e-perfect equilibria p £ — > p as e — > 0, 
where p is the perfect equilibrium of G. 

An alternative route to the same result can be arrived at as follows using an argument based on the 
convergence properties Markov chain. 

Theorem 3.1. For any normal form game G = {N, (Si) ieN , (ui) ieN ) , it is possible to define a MCMC 
algorithm such that its transition probabilities will converge to a perfect equilibrium as long as the following 
conditions hold: 

1. if Ui (sij,p^.j) — Ui (sii,p^.j) > then accept, where p k _ i is the tuple mixed strategies selected on the kth 
iteration; 

2. otherwise, accept if probability exp ^ ,p -- )^"' ^ > £ ^ w /j ere s ^ U [0,1]; and 

3. in addition it can be seen that for all and su G Si such that Ui ( y Sij,p'L i ) < Ui ( y s a ,p l t^, a* ; (T) — ► 
as T — > oo. 

Proof. For each player i s N, there will be a collection these subsets 

Nij = {su e Si;ui (sij,p-i) < Ui (su,p-i) and p G x ie jvA*} (3-17) 

of i's pure strategy space Si. The collection of these sets will referred to as player i's local neighborhood 
structure. What we would like to do is for any two pure strategies Sy, su G Si define a path from s^ to su 
such that 

Sij 1 G Nij,Sij 2 G N i j 1 ,...,su G N ijm . (3.18) 
In order to do this, we observe that the point-set mapping defined by the set 

F% (Pit ■■■iPn) = {Pi € A*; Ui (sij,p_i) < Ui (su,p-i) then < £, Vs ij5 s u G Si} (3.19) 

is a collection homogenous transition probabilities Si 

p) t (k) = Pr{s, (k) = s u \ Sl {k - 1) = Sij} = Pr{s iI |s y } . (3.20) 

Further more we can see that these transition probabilities have the Markov property, i.e. given the path 
from to su such that 

s ln G N ij7 s l: j 2 G N i:jl ,...,s a G N ijm . (3.21) 

the conditional probability 

Pr {suSij 1 , Sij 2 , ■■■Sij m , s^} 

= P*{sti\sij m }Pr{sij m \si jm _ 1 } ..Pvisij^SijJ 
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Wc define the following generating probability for the Markov chain for each player i G N 

9ii = \ /P* 77 a . 3.23 

31 \ 0, otherwise, v ' 

where 

/°( s y) = l{ s ^ £ ( s y,P-») < M i (su,P-i) and p G X ieW A|}| . (3.24) 
We now introduce the following acceptance probability 



Ui (sijfP^ 1 ) - Ui {s t i,-p k _ i 1 ) 



a},(T) = |l,ex P 
T > 

where T is a control parameter. This last condition implies that 



(3.25) 



1. if (sijfP^A — u.i (s«,p*,-) > then accept, where p*. j s ^he tuple mixed strategies selected on the 
kth iteration; 

2. otherwise, accept if probability exp I ' v ' ' ~ T ' ' — I > e, where e ~ U [0, 1] ; and 



3. in addition it can be seen that for all Sij and su £ Si such that Ui (sij,P*ii) < Ui (sy,p^J, a* ; (T) — > 
as T — > 00. 

Given theses three conditions we can now see that the following will hold: 

• We know that under this acceptance criterion as k — > 00 The transition probability matrix p\ of the 
homogenous Markov chain generated by the game G will converge on a stationary distribution 7r (T) 
as k — > 00. 

e -C(i)/T 

^M T )- Efcege -c (fc) /T (3-26) 

and as T — > 00 

t< (T) - I P G ^ (3.27) 

v ; \ otherwise v ; 

where 

Ni = {su G Si; Mi (aij,p-i) < Ui (su,P-i) ,Pi = 0} . (3.28) 
(See van Laarhoven and Aarts [SO] p. 22-25] for the proof of this last statement.) 

• The transition probability matrix p\ satisfies Myerson's definition of an e-perfect equilibria and as 
Myerson has shown, the fixed point that this sequence converges on is also a perfect equilibrium. □ 



4 An Application to Extensive Form Games 

There are problems with viewing the existence of Nash equilibria as an end in itself. The most immediate 
problem with this has been the possible large number of Nash equilibria that can be found for any game, 
together with the likelihood that not all of these Nash equilibria will be reasonable in some sense. One way 
around this is to view the decision process of each agent participating in the game from a decision theoretic 
perspective. From this viewpoint, only those equilibria that can be found by backwards induction will be 
self-enforcing. This leads to a technique for strategy space reduction by iteratively removing strategies that 
lead to outcomes that are not strongly dominated. As shown by Kuhn Corollary 1] , under the assumption 
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of perfect information, this leads to a recursion that is equivalent to the Bellman equation of dynamic 
programming. 

An alternative to this is to construct a recursion that iteratively eliminates weakly dominated strategies. 
However, the removal of weakly dominated strategies can lead to the elimination of strategy profiles that 
would otherwise provide suitable outcomes if only strongly dominated strategies were to have been removed. 
From the viewpoint of this paper these recursive strategy space reduction techniques can be considered 
to be an algorithm that reduces the size of a game, making equilibrium selection easier. However, these 
iterative reduction techniques becomes unwieldy once the assumption of perfect information is relaxed and 
information sets contain more than one node of the game tree. 

This has led to a number of refinements to the definition of Nash equilibrium. Among the first of these 
was the notion of subgame perfection which removes strategies that are not optimal for every subgame 
of a extensive game's game tree. However, Selten ^B] has shown that subgame perfection can also prescribe 
non-optimizing behaviour at information sets that are not reached when the equilibrium is played. This is 
because the expected payoff for the player whose information set is not reached will not depend on their 
own strategy. As a result every strategy will maximize their payoff. As van Damme p. 8-9] states, 
that this can be removed if the equilibrium prescribes a choice, at each information set that is a singleton, 
that maximizes the expected payoff after the information set. The problem is that not all subgame perfect 
equilibria satisfying this criteria are sensible. 

Another approach which was suggested by Selten \IQ , was to eliminate "unreasonable" subgame perfect 
equilibria by allowing the possibility of "mistakes" or "trembles" on the part of decision makers. In this way, 
isolated information sets are removed, as every information set can now be reached with positive probability. 
The other advantage of trembling hand perfection is that, unlike subgame perfection, it can be applied 
directly to the normal form of any game. Although, as van Damme shows, the perfect equilibria of a game's 
strategic and extensive forms need not coincide. An equivalence relationship holds for only the agent normal 
form and extensive form of any game [lfij . This is because the agent normal form of any game views each 
node of the game tree, of the extensive form of the game, as a player in the game. As a consequence each 
player represents an information set held by the player and will have an identical payoff function to the 
player. 

As was shown by Selten |16j . the perfect equilibria of a game's strategic and extensive forms need not 
coincide. However he showed that an equivalence relationship holds between the equilibria of any extensive 
game and its associated agent normal form |16j . This is because the agent normal form of any game views 
each node of the game tree, of the extensive form of the game, as a player in the game. As a consequence 
each player represents an information set held by the player and will have an identical pay-off function to 
the player. 

We let T e define an extensive game consisting of a set of n players, a game tree K = (T, R) consisting of 
a set of nodes T and a binary relation R which is a partial ordering on the set of nodes. The nodes of the 
game tree are classified as either non-terminal or terminal according to whether or not their are succeeding 
nodes in the game tree. The partial ordering is used to define a path of successive nodes. The non-terminal 
nodes of the game tree are partitioned into the sets Pq, Pi, P n that specify the moves associated with each 
player, with P being the partition associated with random moves that are not associated with any player. 
All of the non-terminal nodes is the information partition U — {Ui, ....,U n ), where each set Ui is a partition 
of Pi into information sets, such that all nodes within an information set u £ Ui have the same number of 
immediate successors and path intersects an information set at most once. Under the assumption of perfect 
information each information set u £ Ui will be a singleton. This paper will assume imperfect information - 
this implies that if the information set u £ Ui contains a node x £ Pi, player i will not be able to distinguish 
other nodes contained in this information set based on information possessed when moving to x. Throughout 
this paper it will also be assumed that complete information is present - i.e. each player has perfect recall 
and will remember everything from earlier in the game, including their own moves. 

Associated with each random move is a probability distribution p. The payoffs associated with the 
set of terminal points Z of the game tree are denoted by the n-tuple r = (r*i, ...,r n ), where each player's 
payoff is a function of the terminal points r, (z), z £ Z. With the information partition U a choice set 
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C = {C« : u £ Uf =1 Ui} can be denned, where each C u is a partition of the union of sets of successors 
S (x) — {y;x £ P (y)} for each x £ u: U X ^ U S (x). The interpretation is that if player i takes the choice 
c £ C u at information set u £ Ui , then if i is at x £ u, the next node reached is the element of S (x) 
contained in c. Under the assumption of imperfect information and perfect recall, a probability distribution 
bi is assigned on C u to each information set u £ Ui. This distribution &j is a behavioural strategy, with 
the set of all these strategies for player i defined by £>;. The profile of all players behavioural strategies is 
denoted by b £ B := x™ =1 Sj, where B is the set of all behavioural strategy combinations. The probability 
of a particular realization of the game F e is denoted by P& (z). 

The definition of perfect equilibrium we will use is based Selten \nj\ an d Friedman Kuhn [§] has 
shown that these behavioural and mixed strategies are realization equivalent. Therefore, for an extensive 
form game T e we let T — (S, R) define its strategic form representation, with S denoting the set of all mixed 
strategy profiles. The payoff profile R is an n-tuple, where the ith element is defined as 

Ri = J2 Pb w n w ■ 

A perturbed game of T is defined by (T, rj), where 77 is a mapping that assigns to every choice in T a positive 
number r\ c such that 

c£C„ 

for every information set u. An equilibrium point b of the strategic game T is a perfect equilibrium if b 
is a limit point of a sequence {b(r])} as 77 — > 0, where each 6(77) is an equilibrium points of the associated 
perturbed game (T, rj). 

The algorithm is constructed using a simulated annealing algorithm found in van Laarhoven and Aarts 
|2(J1 p. 10]. The pseudo-code for this algorithm is given below: 

begin 

Intitialize; 

M : 0; 
repeat 

repeat 

Perturb (config. i Ai?y()) for player 1; 
if (ARij > 0) then accept 

elseif (exp ( -^ii j > rand[0, lfj then accept; 

if accept then Update(config. j); 

Perturb (config. i — > j, AJ?y()) for player n; 

if (ARij > 0) then accept 

elseif ^exp ( z^3n j > rand[0, l)j then accept; 

if accept then Update(config. j); 

until equilibrium is approached sufficiently closely; 

Cm+i := / (cm); 
M := M + l; 

until stop criterion = true; 

end 
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The energy function differential for this algorithm is defined as follows: 

ARij = Rj - R, i < j 

where the Ri are the expected pay-off functions for each player participating in the perturbed game. The 
temperature function c controls the trembles and is updated by the decrement rule 

cm+i = a • c m, < a < 1, M = 1,2, ... . 

We apply it to the following example taken from Friedman 2 p. 51]. This example is based on the three 
player extensive form game used by Selten |16| to illustrate the existence of perfect equilibrium. The game 
tree is defined as follows in Figure 0] [2 p- 50]. 



3 4 




Figure 4.1: Selten's Horse Game Tree 



This game possesses both a perfect equilibrium as well as "non-sensical" subgame perfect equilibria. The 
perfect equilibrium for this extensive form game is defined via the perturbed pay-off functions: 

Ri = ai(l — £2 — 3e 3 + ie 2 e 3 ) + 3e 3 

R 2 = 2e 3 (2-ei)+a 2 (l-£i -4e 3 + 4eie 3 ) 

i? 3 = 1 - ei + a 3 (2ei - e 2 + £i£ 2 ), 

where the cti are the mixed strategies and Ei are errors defined for i = 1,2,3. Letting the errors approach 
zero, it can be seen that perfect equilibrium is defined by (1, 1,0). 

The results of the simulation are shown below in Figure 14.21 and indicate convergence to the trembling 
hand perfect equilibrium. 

5 Conclusion 

This paper has concentrated on some of the underlying theoretical mechanics of simulated annealing and how 
they relate to the trembling hand perfect refinement of Nash equilibrium. It has been argued that the trembles 
that underlie global optimization by simulated annealing are analogous to the "mistakes" of trembling hand 
perfection, in that they present a means of moving from local equilibria. The main contribution of this paper 
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Figure 4.2: Three-person game with imperfect competition and payoff solutions 



has been to apply simulated annealing to solve a game that is known to possess both a perfect equilibrium 
and "nonsensical" subgame perfect equilibrium. Preliminary results indicate a convergence to the perfect 
equilibrium, with a mixing strategy occurring for two of the three players. 
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